Modelling drought in South Africa: meteorological insights and predictive parameters

South Africa has grappled with recurring drought scenarios for over two decades, leading to substantial economic losses. Droughts in the Western Cape between 2015 and 2018, especially in Cape Town was declared a national disaster, resulting in the strict water rationing and the “day zero” effect. This study presents a set of simulations for predicting drought over South Africa using Artificial Neural Network (ANN), using Standard Precipitation Index (SPI) as the drought indicator in line with the recommendations of the World Meteorological Organization (WMO). Furthermore, different meteorological variables and an aerosol parameter were used to develop the drought set in four distinct locations in South Africa for a 21-year period. That data used include relative humidity (rh), temperature (tp), soil wetness (sw), evapotranspiration (et), evaporation (ev) sea surface temperature (st), and aerosol optical depth (aa). The obtained R2 values for SPI3 ranged from 0.49 to 0.84 and from 0.22 to 0.84 for SPI6 at Spring Bok, Umtata 0.83 to 0.95 for SPI3, and 0.61 to 0.87 for SPI6; Cape Town displayed R2 values from 0.78 to 0.94 for SPI3 and 0.57 to 0.95 for SPI6, while Upington had 0.77–0.95 for SPI3, and 0.78–0.92 for SPI6. These findings underscore the significance of evapotranspiration (et) as a pivotal parameter in drought simulation. Additionally, the predictive accuracy of these parameter combinations varied distinctly across different locations, even for the same set of parameters. This implies that there is no single universal scheme for drought prediction. Hence, the results are important for simulating future drought scenarios at different parts of South Africa. Finally, this study shows that ANN is an effective tool that can be utilized for drought studies and simulations.


Introduction
Drought, recognized as a prolonged and extensive scarcity of natural water resources, is a terrible natural phenomenon with far-reaching consequences (WHO, 2023;USGS, 2023).Its occurrence is not confined to specific regions, impacting diverse areas across the globe (Hao, et al., 2018).However, the manifestation and severity of drought exhibit distinct characteristics contingent upon regional climates, resulting in varying impacts on societies and environments (WMO, 2012).This period is majorly characterized with dry weather because of low rainfall and high temperatures.According to Heim (2002), the consequences of drought encompass multifaceted domains such as health, agriculture, economies energy, and the environment.Hence, the interplay between drought and climate change is increasingly evident, with rising temperatures exacerbating the severity and frequency of drought occurrences (Mukherjee, et al., 2018;Shamshirband, et al., 2020).Also, variations in climate pattern such as El Niño and La Niña can lead to drought.El Nino which is characterized by warmer ocean temperatures can increase the occurrence of drought in the USA and Southern Africa, while La Nina associated with cooler oceanic temperatures can lead to drought in Australia (Rescue, 2023).Furthermore, changes in the jet stream can also cause drought when dry air from one part of the globe is brought to other areas.
Furthermore, anthropogenic activities, including urbanization, deforestation, and alterations in land use, have compounded drought effects (Deo, et al., 2017;IPCC, 2012;McAlpine, et al., 2009).These changes modify the energy balance within ecosystems, leading to increased aridity, evapotranspiration, soil albedo, and temperatures.Despite concerted efforts, drought remains one of the most complex and least understood natural hazards with far-reaching consequences (see Obasi, 1994;Kiem, et al., 2016;Sundararajan, et al., 2021).Studies have shown that the consequences of drought encompass negative effects on the environment, health, agriculture, energy, and the economy.These impacts stem from the pivotal role water plays in the everyday existence of all living organisms, including humans.This requires effective mitigation strategies against the impacts, while necessitating comprehensive approaches involving water conservation, improved agricultural practices, and the development of drought-resistant crops.
Consequently, implementing early warning systems equipped with models and drought contingency plans, often referred to as "drought response protocols," is imperative across all sectors of the economy.This underscores the need for constant drought prediction and modelling which are crucial aspects of effective awareness, monitoring, and mitigation strategy.Addressing these aspects can significantly reduce the detrimental impacts of drought on society, both in the short and long term.Effective drought monitoring necessitates the utilization of robust modelling techniques, instrumental in devising mitigation strategies (Gorgij, et al., 2022).This involves establishing correlations between the indices of interest and predictor variables (Hao, et al., 2018).However, the intricate and nonlinear nature of drought poses significant challenges to modelling this phenomenon coupled with the fact that there is not a universally superior model applicable to all locations.Hence, the dynamic nature of drought modelling variables, influenced by environmental disparities, presents a substantial hurdle.As a result, most models are localized, relying on historical or re-analysis data to derive predictors (Hao et al., 2018).A range of models has been developed globally to forecast drought, each exhibiting varying degrees of accuracy, in line with the techniques used and geographical location.For instance, Shamshirband et al. (2020) predicted the SPI stream run-off using Gene Expression Programming (GEP), M5 model tree (M5), and Support Vector Regression (SVR), obtaining different R 2 values: 0.396 and 0.691 for GEP, 0.701 and 0.829 for M5, and 0.853 and 0.898 for SVR.Rafiei-Sardooi et al. (2018) modelled drought in Iran employing ARIMA and neuro-fuzzy techniques, resulting in R 2 values of 0.20 and 0.63 for SPI 3, and 0.50 and 0.79 for SPI 12, respectively.Adnan et al. (2021) utilized a hybrid Random Vector Functional Link-Hunger Games Search (RFVL-HGS) approach, yielding an accuracy of R 2 = 0.815 for SPI 3 and R 2 = 0.829 for SPI 6.Similarly, Gorgij et al. (2022) in Iran reported correlations ranging from 0.836 to 0.856 for SPI 3 and 0.889 to 0.909 for SPI 6 using Long Short-term Memory (LSTM).Khan et al. (2020) employed SVM, ANN, and KNN to categorize drought in Pakistan, obtaining varied correlation ranges for different drought classifications.Furthermore, Mouatadida et al. (2018) compared models and found that ANN outperformed extreme learning machine (ELM), multiple linear regression (MLR), and support vector regression (LSSVR) in predicting drought in Eastern Australia.Deo et al. (2017) combined various parameters and models achieving R 2 values ranging from 0.994 to 0.989, 0.944 to 0.988, and 0.916 to 0.987 using MARS, LSSVM, and M5Tree, respectively, in Eastern Australia.
In the context of South Africa, persistent drought occurrences have been a challenge over the past four decades (Meza et al., 2021).Notable among these drought episodes is the severe 2015-2017 period, which significantly reduced water availability in the Western Cape and Eastern Cape provinces (Mahlalela, et al., 2020;Omar & Abiodun, 2020).This crisis led to the declaration of a national disaster in the Western Cape (Visser, 2018), resulting in substantial losses, especially in the agricultural sector (Pienaar & Boonzaaier, 2018), and the well-documented "day zero" in Cape Town, impacting around 3.7 million people (Sousa, et al., 2018;Burls, et al., 2019;Pascale, et al., 2020).Various factors contribute to drought in South Africa, including changing rainfall patterns attributed to shift jet streams and storm tracks (Mahlalela, et al., 2018;Sousa, et al., 2018), Hadley Cell expansion in the Southern Hemisphere (Burls et al., 2019), and ocean-atmosphere interactions (Chivangulula et al., 2023).Additionally, increased water demand due to increased urbanization, water mismanagement, and insufficient investment in water reservoir infrastructure and agriculture exacerbate the situation (Mahlalela et al., 2020;Schreiner, et al., 2018;Meza, et al., 2021).Continued climate change and global warming are anticipated to perpetuate the prevailing drought conditions in the forthcoming years (Abiodun, et al., 2018;Engelbrecht et al., 2009;Naik & Abiodun, 2019).Consequently, the predominant focus of drought mitigation responses in South Africa has been directed toward the agricultural sector (du Pisani, et al., 1998;Kamali, et al., 2018;Magombeyi & Taigbenu, 2008;Masupha & Moeletsi, 2020;Muyambo, et al., 2017;Schwarz et al., 2020), particularly at the national level.Notably, Nxumalo et al. (2022) highlighted the adverse impact of drought on wheat production in South Africa, especially in Free State and Mpumalanga provinces, resulting in significant reductions in production levels (Chikoore & Jury, 2021;Chivangulula, et al., 2023).Furthermore, Nemukula et al. (2023) employed Schlather model to analyze drought characteristics in the Lowveld region of Limpopo.Additionally, Naik and Abiodun (2019) used CORDEX models, obtaining correlations of 0.38 using the Standardized Precipitation-Evapotranspiration Index (SPEI) and 0.24 for SPI when comparing observed and simulated droughts.Mathivha et al. (2020) forecasted drought in the Vhembe District of Limpopo, utilizing Generalized Additive Models (GAM), Ensemble Empirical Mode Decomposition (EEMD)-GAM, EEMD-Autoregressive Integrated Moving Average (ARIMA)-GAM and fQRA models, each yielding different correlation ranges, namely 0.48 to 0.95 for GAM, 0.79 to 0.95 for EEMD-GAM, 0.94 to 0.99 for EEMD-ARIMA-GAM, and 0.92 to 0.99 for Forecast Quantile Regression Averaging (fQRA).Ikegwuoha and Dinka (2020) simulated drought in the Lepelle River Basin (LRB) in South Africa using the GCM, resulting in a correlation of 0.836 and predicting increased drought conditions in the twenty-second century.These studies show that different drought models performed differently, highlighting the complexity and uncertainty of drought modelling.Hence, the aim of this study is to create models for predicting meteorological drought over some selected locations in South Africa (Cape Town (CPT), Umtata (UMT), Spring Bok (SB), and Upington (UPT)) using Artificial Neural Network (ANN) and meteorological variables.This research will further identify the optimal parameter set for simulating meteorological drought in each location while utilizing SPI indices at 3-month and 6-month timescales.

Study area
South Africa spans latitudes 22° S to 35° S and longitudes 17° E to 33° E (refer to Fig. 1), situated in the southernmost part of Africa.Encompassing an area of approximately 1,219,602 km 2 , it boasts a coastline extending over 3000 km.The country's landscape encompasses diverse topography, featuring an arid desert in the northwestern region and somewhat arid conditions along the eastern coast.Positioned within the "drought belt" region of sub-Saharan Africa, South Africa contends with water constraints, akin to numerous neighboring nations.Rainfall duration varies across distinct geographical areas.In the Western Cape, the peak rainfall occurs during winter, contrasting with other regions that experience their highest precipitation in summer.On average, annual precipitation over South Africa totals approximately 464 mm (see Table 1 for average precipitation in each study location).Consequently, South Africa can be divided into two major physiographic regions: the interior plateau, and the land between the plateau and the coast separated by the Great Escarpment.The interior plateau is part of the great African plateau which stretches to the Sahara Desert (Webber, 2008;SAG, 2024).Also, temperature in South Africa is lower when compared with other countries at similar latitudes due to its high elevation.Summer temperatures typically range from 15 to 36 °C, while winter temperatures vary between − 2 and 26 °C.The interior plateau with altitude of 1694 m maintains an average summer temperature below 30 °C and a freezing night temperature during winter, while coastal regions are relatively warm during winter.Furthermore, there is a differential temperature gradient between east and west coasts; hence, the warm Agulhas Current sweep through the east and cold Benguela Current through the west coastlines, respectively (SAG, 2024).
On the other hand, north of South Africa lie Namibia, Botswana, and Zimbabwe, while Mozambique and Eswatini border its eastern side.Additionally, Lesotho is an enclave within South Africa.The Indian and Atlantic Oceans, respectively, are South Africa's southern and western borders, whereas Cape Agulhas is the demarcation point where these two oceans intersect.For this study, the selected sites include Cape Town (CPT) in the Western Cape, Spring Bok (SB), and Upington (UPT) in the Northern Cape, as well as Umtata (UM) in the Eastern Cape (refer to Table 1).
The Moderate Resolution Imaging Spectroradiometer (MODIS) sensors are installed on both the Terra and Aqua satellites, orbiting the Earth twice daily: Terra in the morning (north to south) and Aqua in the afternoon (south to north).These instruments provide valuable insights into climate dynamics and processes globally, spanning land, ocean, and atmospheric regions.This study specifically employed MODIS ).These data were set to the same grid as the MERRA 2 inputs.All the data utilized in this study covered a temporal span of 21 years (from 2000 to 2021), and monthly averages were computed for analysis.

Data integration
To integrate all the datasets for model creation, an Artificial Neural Network (ANN) was employed.The ANN, a technique utilized across various fields, including drought studies (as mentioned in the introduction), has demonstrated its reliability as a statistical analysis tool.The specific architecture of the utilized ANN is structured as Input-Hidden Layer Neuron-Output (refer to Fig. 2).The input layers are the combinations of parameters, while the SPI value represents the target or output.Training the network involved employing the Levenberg-Marquardt (LM) algorithm, selected for its efficiency in minimizing error functions and reducing training time during neural network training (Jang et al., 1997).Additionally, the MATLAB tansig function, employed to facilitate the transfer of functions from the input layer to the hidden layer and subsequently to the output, is given in Eqs.1a and 1b (see also Okoh et al., 2019;Onyeuwaoma et al., 2021).Tansig function also introduces non-linearities into neural networks, enabling them to learn and model complex patterns and relationships within a set of data.
(1a) H m = tanh(I wm × I m + B 1 ) Equation 1a connects the input layer matrix I m to the hidden layer matrix H m , and Eq.1b connects the hidden layer matrix to the output layer matrix O m .I m contains inputs for the neural network, H m contains intermediary values computed within the hidden layer, and O m contains outputs from the neural net- work.I wm and H wm are the respective weight matrices for the input and the hidden layers.B 1 and B 2 are the bias vectors for the input and the hidden layers.The input weight matrices and the bias vectors contain constants for a given trained neural network.
Subsequently, 70% of the collected data was allocated for training, while 15% each was set aside for validation and another 15% for testing purposes.This data allocation formula had been applied in several research as well.Determining the optimal number of hidden layer neurons (HLN) for each model involved applying a method developed by Dan Okoh (refer to Okoh, 2023).This method entailed generating 100 networks for each model (refer to Table 2 for the data combinations utilized in these models).The number corresponding to the network with the lowest root mean square error (rmse) was selected as the input value for the HLN count.The input datasets are relative humidity (rh), temperature (tp), evapotranspiration (et), evaporation (ev), soil wetness (sw), sea surface temperature (st), and aerosol optical depth (aa).These parameters chosen have a link with drought conditions such that at low RH, the rate of transpiration is on the increase especially during drought.Consequently, droughts are also associated with clear sky that ensures an increase in incoming solar radiation and subsequent higher daytime temperatures.Further to this, rise in evapotranspiration (et) is an important trigger factor for seasonal drought in any given region; hence, it increases during drought.Also, warmer temperatures enhance evaporation (ev), which reduces surface water and dries out soils and vegetation.Hence, abnormal dryness of soil is associated with drought condition.This makes the period of low precipitation drier than it would be during cooler conditions.Sea surface temperature (SST), on the other hand, gives information on the heat content at the ocean surface.SST influences the earth's climate (Gonsamo et al., 2016), through variations in climate factors (such as temperature, soil moisture, precipitation) which are associated with drought (see Yan et al., 2019;Kumar et al., 2024).Atmospheric aerosols measured as aerosol optical depth (aa) influence the hydrological cycle in the atmosphere depending on their optical properties.Some species of aerosol scavenge for the available water vapor thereby exacerbating the dry condition.Furthermore, drought conditions increase the emission of aerosols into the atmosphere especially dust-related species.
For the analysis, data spanning from 2000 to 2017 were employed to train the network, while the remaining (from 2018 to 2021) were reserved for validating the models.
Furthermore, analysis of variance test (ANOVA) was used to statistically analyze the differences between simulated and measured SPIs.The values of the F-statistic and p-value (0.05) determine when to accept or reject the null hypothesis.
The SPI quantifies precipitation deficiencies across varying time frames.This study focuses on meteorological drought utilizing the SPI, recommended by the World Meteorological Organization (WMO) due to its versatility in assessing drought over periods ranging from 3 to 24 months (Mckee et al., 1993).
The SPI, which is a function of probability density function for the gamma distribution g(x), is given as: where ∝> 0 is the shape parameter,  > 0 is the scale parameter, and x is the rainfall measurement.The gamma function ( ) (University of Arizona, 2023)  shown in the above equation is defined as while, where n is the number of rainfall measurements, and x is the mean of x.
The SPI was categorized into distinct events based on the computed values, detailed in Table 3.Following this schema, a drought event initiates when the SPI values dip below zero and concludes when they rise to a positive range. (2)

Result and discussion
Time series plots for SPI 3 and SPI 6 The primary objective of this study is to compute the Standardized Precipitation Index (SPI) for four specific locations at 3-month (SPI 3) and 6-month (SPI 6) timescales, detailed in Fig. 3 and Table 4 which are the timeseries plots and classification of droughts derived from calculating the SPI using Eq. 2.
Figure 3 displays the findings indicating sustained drought across all locations during the specified period.The most severe drought was observed at Spring Bok from 2016 to 2020, followed by Cape Town during the same timeframe.Examining Fig. 4 reveals an escalating intensity of drought, notably noticeable at Spring Bok (SB) and Cape Town (CPT) between 2013 and 2021 for SB, 2014 and 2020 for CPT, and 2018 and 2021 at UMT.The data in Table 4 demonstrates the percentage of SPI 3 drought occurrences, indicating that mild drought was the predominant category across all locations: CPT (27.56%),SB (30.31%),UMT (24.8%), and UPT (31.5%).This was followed by moderate drought.Notably, the most severe drought conditions were recorded at UMT (9.84%), whereas SB experienced the highest proportion of extreme wetness (2.76%).Analyzing the different drought categories for SPI 6, as presented in Table 4, reveals the prevalence of mild drought across all locations, with 30.68% at UPT and 29.88% at SB.The highest percentage of the most critical drought conditions occurred at UM (8.77%) and SB (8.37%).
Further examination of Table 4 reveals that for SPI3, Cape town (CPT) experienced drought conditions 50% of the time and was wet for the remaining 50%, similar to Spring Bok (SB) with 44.48% drought and 55.52% wet, and UPT 50.4% drought and 49.6% wet.Looking at SPI 6, CPT encountered drought 48.21% of the time and was wet for 51.79%, while SB had 47.81% drought and 52.19% wet; UM experienced  & Delworth, 2006).This could potentially account for the drought patterns observed around UPT, CPT, and SB, which are located along the Atlantic coast.

ANN model results
In this section, we examined the performance of the 11 distinct Artificial Neural Network (ANN) configurations using various sets of parameters to develop Fig. 4 SPI6 time series for the study period models for each location studied across different SPI timescales.Following the analysis, we determined the most effective model for each specific station.Figure 5 and Table 5 present the statistical analysis of the Artificial Neural Network (ANN) models from the different sets of parameters considered.In Fig. 5, a visual representation illustrates how the dataset performed concerning the independent variable output.The dashed line represents the ideal scenario, while the solid line reflects the actual performance, and the gap between them signifies the network's deviation from the ideal situation.In this instance, the R values indicate strong performance, consistently noted ~ 0.8 or higher.The remaining statistics of the network is detailed in Table 5.The table indicates that for SPI 3, the R values across all locations range from 0.71 to 0.99, with mean square error (MSE) values varying between 0.04 and 0.6 across different data divisions (such as training, validation, and testing).For SPI 6, the R value range from 0.50 to 0.99, and MSE ranges from 0.09 to 0.99.These findings suggest that the models perform better in predicting SPI 3 compared to SPI 6.Moreover, Table 5 highlights that the ANN training results exhibited comparatively lower performance at Spring Bok.Notably, in the analysis conducted at Cape Town, sets 6, 7, and 8 were excluded due to some persistent errors encountered in the models.
Typically, the relationship between R and MSE ( R ∝ 1 MSE ) holds across all locations.Consistently, the training, validation, and testing results follow a similar trend at Cape Town (CPT), Umtata (UMT), and Upington (UPT), except for Spring Bok (SB).At SB, there are instances where either R or MSE is notably high for specific data divisions, as depicted in Figs. 15 and 16 in the Appendix.
The model predictions' outcomes are showcased in Figs. 6, 7, 8, and 9.These displayed results serve illustrative purposes and may not represent the best outcomes for all locations.Consequently, additional graphs are not presented here to avoid the repetition, it is provided in the Appendix.
Figures 6 (also  the model results for Cape Town (CPT), highlighting correlations ranging from 0.78 to 0.94 for SPI 3 and from 0.57 to 0.95 for SPI 6.Among these, the combination of set3 yielded the strongest correlation of 0.94 for SPI 3, whereas set2 and set5 produced the weakest correlations.For a longer-term prediction (SPI 6), set1 demonstrated the highest correlation of 0.95.
Furthermore, the findings suggest that evapotranspiration (et) serves as a superior substitute for evaporation (ev) at both time scales.Notably, utilizing relative humidity (rh) and total precipitation (tp) exclusively does not yield favorable outputs compared to other parameters.Additionally, soil moisture (sw) emerges as a crucial parameter in drought modelling at this   10 in the appendix), the results revealed that most models exhibited poor performance, displaying correlations ranging from 0.49 to 0.84 for SPI3 and from 0.22 to 0.72 for SPI6.Among the simulations tested for SPI3, set4, set7, and set8 demonstrated the most effective performance, achieving a correlation value of 0.84 each.Conversely, set2 exhibited the weakest correlation at 0.49.Specifically, for SPI6, set7 simulations showed the strongest correlation (0.72), while set2 recorded the least correlation at 0.22 followed by set8 (0.37).In most cases, there appears to be a consistence in the set performance at both timescales, suggesting the effective modelling of both timescales using the same dataset.Additionally, evapotranspiration (et) emerged as a critical parameter in these models.For effective drought simulation at this location, the best sets to use are 4, 7, 8, and 10 for SPI3 and sets 1 and 7 for SPI 6.
The results obtained from Umtata, presented in Figs. 8 (also Figs. 25,26,27,28,29,and 30 in the Appendix for the remaining comparison plots for UMT) and 10 (see also Table 11 in the Appendix), depicted model correlations ranging between 0.82 and 0.95 for SPI3, and between 0.61 and 0.88 for SPI6.Notably, for both SPI3 and SPI6, set1 resulted in the most favorable outputs, yielding corrections of 0.95 and 0.88, respectively.Conversely, set3 demonstrated the weakest at 0.82 for SPI3, and set2 exhibited the lowest correlation of 0.0 for SPI6.Furthermore, it was observed that incorporating evapotranspiration (et) enhanced simulation performance especially for SPI3.
In Fig. 9 (also Figs. 31,32,33,34,35,and 36 in the Appendix for the remaining comparison plots for UPT) and e (see also Table 12 in the Appendix), the findings for Upington revealed that simulation sets 5 and 10 yielded the most favorable outcomes for SPI3, while set2 (0.92) produced the best results for SPI 6.Conversely, the weakest correlations were  In general, the results obtained using ANN indicate that most combinations of parameters exhibit strong predictive capabilities for drought at SPI3, while specific combinations perform well for SPI6.Sequel to this, the low p-values at most locations indicate that the set combinations can effectively simulate drought at both time scales; hence, they are below 0.05 confidence level, with the exception of some simulations at Spring Bok which proved otherwise (set2 (7.32e-08) for SPI3 and set2 (0.00127) and set9 (1.53e-07) for SPI 6).Therefore, the higher the R 2 the lower the p-value.Across both timescales, it was consistently observed that the combinations demonstrating the strongest correlations tend to include evapotranspiration (et) and/or evaporation (ev) components.Noteworthy correlations of 0.95 for SPI3 were achieved at Umtata (UMT) (set1) and Upington (UPT) (set5 and set10), while the highest correlation for SPI6 (0.95) was attained with set1 at CPT.These outcomes are consistent with correlations found in prior studies in arid and semi-arid regions, as shown in Table 6.Consequently, across all locations examined, certain models display the capability to predict SPI with an accuracy level of no less than R 2 0.8.Comparing with the results obtained for other locations given in Table 6 in semi-arid regions with similar study sites (CPT, SB, UMT), it shows that for SPI3, Adnan et al. (2021) using RFVL-HGS   2017) did not make a clear distinction of the temporal scales of their models, and our models compared well with them irrespective of the timescale.Subsequently, comparing with the results for an arid region in Iran obtained by Rafiei-Sardooi et al. (2018), which were R 2 of 0.50 and 0.20 for Neuro-fuzzy and ARIMA respectively, it showed that both model performances were below the results we obtained for a similar location (Upington) which were 0.77-0.95for SPI3 and 0.78-0.92for SPI6.The range of results from our simulations implies that the performance of some of our combination sets are at par with what was obtained from these other models in both arid and semi-arid regions considered.Similar temporal variations observed in our simulation sets, between 3and 6-month timescales, were also visible in other locations in which the differences are random.Such that a model may perform better in SPI3 and fail in SPI 6 and vice versa.Finally, the simulated results show that ANN drought simulations obtained for South Africa can compare effectively with results obtained for similar locations.Hence, the robustness and effectiveness of ANN in modelling drought at different temporal scales can be established.

Comparison of measured and model SPI values
In this section, we compared the mean values of the measured and modelled results for the various locations at different timescales to ascertain the model with the most varying mean.
Figures 11,12,13,and 14 (also refer to Figs. 37,38,and 39 in the Appendix for the remaining ANOVA plots) illustrates ANOVA plots comparing the modelled and measured SPI.In the top row (box plots), the central mark represents the median (2nd quantile), while the edges denote the 25th and 75th percentiles.The whiskers extend to the most extreme data points, excluding outliers which are depicted by a + sign.The second row displays the comparison interval of the measured SPI (depicted by a blue bar) against simulations, where both differs it will be represented by a red bar.
Table 7 shows that statistics of the ANOVA, while Table 8 summarizes the ANOVA results in both time scales as shown in Figs. 11,12,13,and 14 and Figs. 39 and 40 in the appendix.This result indicates how the mean values of the simulation set vary with the measured data and between each set."Pass" and "fail" were used to denote the p-values, pass implies Fig. 12 ANOVA plots representing the measured and modelled SPI 3 for Spring Bok.The boxplots in A show the median for the different data sets, while B is the mean, and the blue line represents the mean of the measured data, while the gray lines represent simulated data that all the simulation sets are within the same mean limit with the measured SPI and fail means at least one of the simulation sets has a mean that varied from the measured SPI.In that case, the null hypothesis is rejected whenever the p-value is "fail".
Table 7 shows that the mean values of all the simulations and the SPIs are similar at Umtata at both timescales and Upington at SPI 3. Subsequently, there are observed variations at Cape Town and Spring Bok at both timescales and Upington at SPI 6. Hence the summary in Table 8, where fail and pass are used to designate the simulations differs from the measured SPI.At CPT, it shows that simulations 3, 5, and 11 consistently varied at both timescales, while simulation 9 varied at SPI 6 only.Consequently, simulations 1, 2, 4, and 10 are consistent with the measured SPI at both timescales and simulation 9 at SPI 3 only.At SB, the mean values showed some variations at simulations 1, 2, 3, 5, 6, 9, and 11 at both timescales, and simulations 10 and 7 at SPI 3 and SPI 6 respectively, while they are similar at simulations 4 and 8 at both timescales, and at 7 for SPI3 and 10 for SPI 6.Furthermore, no variations were observed at Umtata, while only two variations were observed at Upington, simulation 8 for SPI 3 and simulation 5 for SPI 6.Therefore, while a high correlation suggests a strong association between variables, it does not directly imply that their means will be similar.Hence, correlation measures the relationship or association, while means indicate the average values of the variables.This implies that although the means may differ, if the correlation is high, one variable might be predicted from the other with reasonable accuracy.This predictive power relies on the strength of the relationship despite their apparent mean estimates.
Fig. 14 ANOVA plots representing the measured and modelled SPI 3 for Upington.The boxplots in A show the median for the different data sets, while B is the mean, and the blue line represents the mean of the measured data, while the gray lines represent simulated data.A comparison of the correlation and ANOVA results (Tables 9,10,11,and

Summary and conclusion
This study demonstrates that South Africa has been affected by drought of varying magnitudes over the past two decades, spanning from mild to extreme conditions.Regions like UM, SB, and UPT experienced mild droughts for over 40% of the time, while CPT encountered them for around 50% of the time.CPT also witnessed moderate drought for approximately 17.72% of the time at SPI3, followed by SB at 12.60%.In terms of severity, CPT faced moderate to extreme drought around 22% of the time, yet it received comparatively more rainfall, approximately 19%, compared to other locations.Overall, this investigation highlights that CPT and SB endured more frequent droughts than other areas studied, particularly those along the Atlantic coastlines.
The study utilized Artificial Neural Networks (ANN) to model drought in four South African locations, employing various meteorological and synoptic data combinations.The results revealed that ANN exhibited greater accuracy in predicting SPI3 compared to SPI6, when utilizing the same parameters within the same location.Moreover, models incorporating the evapotranspiration (et) parameter demonstrated better performance than those relying on evaporation (ev), suggesting et's significance in drought prediction.Notably, a consistently high correlation was achieved from the "all" model across all locations and timescales.Similarly, models such as set3, set5, and set11 consistently modelled SPI more effectively compared to others.
Consequently, the study shows that ANN can be effectively used to model drought at different timescales; this is demonstrated by the accuracy and versatility of the results obtained.Therefore, researchers, policy makers, and practitioners can leverage on the accuracy of ANN to enhance their ability to simulate, manage, and put in place mitigation strategies to alleviate the impact of drought on both the economy and human water usage.
These results further emphasize the complexity and localization inherent in drought modelling, indicating that various parameters exhibit different performance levels across diverse locations.Consequently, the study emphasizes that a singular set of parameters cannot universally characterize drought modelling due to its multifaceted nature.
Author contribution Onyeuwaoma Nnaemeka Dom designed the computational framework and analyzed the data.Onyeuwaoma Nnaemeka Dom carried out the simulations and implementations of the models.Onyeuwaoma Nnaemeka Dom wrote the manuscript with inputs from Sivakumar Venkataraman.Mahesh Bade processed the AOD data from MAIAC used in the models.
Funding Open access funding provided by University of KwaZulu-Natal.
Data availability Every data used in this research will be freely available on contacting any of the authors.

Declarations
Ethics approval We, Onyeuwaoma Nnaemeka, Sivakumar Venkatraman, and Mahesh Bade the authors of the manuscript titled "Modelling Drought in South Africa: Meteorological Insights and Predictive Parameters", hereby affirmed that we adhered to ethical principles throughout the research process that gave rise to this article.

Fig. 1
Fig. 1 Map of study area showing the sample sites

Fig
Fig. 2 Sample schematic illustration of the neural network with Architecture 7-7-1 used in this study Figs. 16, 17, 18, and 19  in the Appendix for the remaining comparison plots for CPT) and 10 (also Table9in the Appendix) represent

Fig
Fig. 5 Sample statistic of the input combinations for forecasting of SPI, showing the relationship between the output and target specific location.Furthermore, in Tables 6, the p-values are used to ascertain the validity/accuracy level of the simulations at 0.05 confidence level.Hence, in addition to the high correlation values, the p-values should be ≤ 0.05 for the result to be significant.InSpring Bok, illustrated in Figs.7 (also Figs. 20,  21, 22, 23, 24, and 25  in the Appendix for the remaining comparison plots for SB) and 10 (Table

Fig. 6
Fig. 6 Sample of model results for Cape Town showing SPI 3 and SPI 6 time series alongside scatter plots for comparison

Fig. 7
Fig. 7 Sample of model results for Spring Bok showing SPI 3 and SPI 6 time series alongside scatter plots for comparison

Fig. 8
Fig. 8 Sample of model results for Umtata showing SPI 3 and SPI 6 time series and scatter plots for comparison

Fig. 9
Fig. 9 Sample of model results for Upington showing SPI 3 and SPI 6 time series and scatter plots for comparison Fig. 10 Correlation values of the different simulation sets at the different locations

Fig. 11
Fig.11ANOVA plots representing the measured and modelled SPI 3 for Cape Town.The boxplots in A show the median for the different data sets while B is the mean, and the blue line represents the mean of the measured data, while the gray lines represent simulated data

Fig. 13
Fig.13ANOVA plots representing the measured and modelled SPI 3 for Umtata.The boxplots in A show the median for the different data sets, while B is the mean, and the blue line represents the mean of the measured data, while the gray lines represent simulated data

Table 1
Geographical information of the four locations selected for this study

Table 2
Set combinations used to generate the simulations

Table 3
(McKee et al. (1993))egories calculated from SPI(McKee et al. (1993)) (Epule et al., 2Bader & Latif, 2003;T faced 49% drought and 51% wet conditions.This analysis suggests that short-term drought situations occur more frequently in CPT and SB, while longer-lasting drought conditions are more prevalent in UPT and CPT.The shift of the Inter-tropical Convergence Zone (ITCZ) southwards due to the warming of the Atlantic Ocean is similar to observed patterns over the Sahel region(Epule et al., 2014).The result is a decrease in moisture input from the Ocean and weakening of the West African Monsoon (WAM) which creates drier conditions, vegetation losses, and increased surface albedo( seeZeng, 2003;Bader & Latif, 2003; Zhang

Table 4
Percentage of occurrence of the different drought categories for study period

Table 5
Statistical details of input combinations utilized for forecasting SPI through an Artificial Neural Network (ANN).The measurement includes mean square error (MSE) and coef-ficient of determination (R) specifically for CPT and SB, also NR implies no result as earlier specified

Table 6
Comparison with some previous studies on SPI and SPEI modelling

Table 7
ANOVA statistic.F, F-statistic, Prob > F, p-value at 0.05 percentage confidence level

Table 9
Summary of outcomes obtained from the models for Cape TownOpen Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.